_cccc_hhhh_rrrr_tttt_bbbb_llll - generate character classification and conversion tables
SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
_cccc_hhhh_rrrr_tttt_bbbb_llll [_f_i_l_e]
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
The _cccc_hhhh_rrrr_tttt_bbbb_llll command creates two tables containing information on character
classification, upper/lowercase conversion, character-set width, and
numeric formatting. One table is an array of (2*257*4) + 7 bytes that is
encoded so a table lookup can be used to determine the character
classification of a character, convert a character [see _cccc_tttt_yyyy_pppp_eeee(3C)], and
find the byte and screen width of a character in one of the supplementary
code sets. The other table contains information about the format of
non-monetary numeric quantities: the first byte specifies the decimal
delimiter; the second byte specifies the thousands delimiter; and the
remaining bytes comprise a null-terminated string indicating the grouping
(each element of the string is taken as an integer that indicates the
number of digits that comprise the current group in a formatted non-
monetary numeric quantity).
_cccc_hhhh_rrrr_tttt_bbbb_llll reads the user-defined character classification and conversion
information from _f_i_l_e and creates three output files in the current
directory. To construct _f_i_l_e, use the file supplied in
_////_uuuu_ssss_rrrr_////_llll_iiii_bbbb_////_llll_oooo_cccc_aaaa_llll_eeee_////_CCCC_////_cccc_hhhh_rrrr_tttt_bbbb_llll______CCCC as a starting point. You may add entries, but
do not change the original values supplied with the system. For example,
for other locales you may wish to add eight-bit entries to the ASCII
definitions provided in this file.
One output file, _cccc_tttt_yyyy_pppp_eeee_...._cccc (a C language source file), contains a
(2*257*4)+7-byte array generated from processing the information from
_f_i_l_e. You should review the content of _cccc_tttt_yyyy_pppp_eeee_...._cccc to verify that the array
is set up as you had planned. (In addition, an application program could
use _cccc_tttt_yyyy_pppp_eeee_...._cccc.) The first 257*4 bytes of the array in _cccc_tttt_yyyy_pppp_eeee_...._cccc are used for
storing 32-bit character classification for 257 characters. The
characters used for initializing these bytes of the array represent
character classifications that are defined in _cccc_tttt_yyyy_pppp_eeee_...._hhhh; for example, ______LLLL
means a character is lowercase and ______SSSS_||||______BBBB means the character is both a
spacing character and a blank. The second 257*4 bytes of the array are
used for character conversion with 514 elements consisting of 16-bit
each. These bytes of the array are initialized so that characters for
which you do not provide conversion information will be converted to
themselves. When you do provide conversion information, the first value
of the pair is stored where the second one would be stored normally, and
vice versa; for example, if you provide _<<<<_0000_xxxx_4444_1111 _0000_xxxx_6666_1111_>>>>, then _0000_xxxx_6666_1111 is stored
where _0000_xxxx_4444_1111 would be stored normally, and _0000_xxxx_6666_1111 is stored where _0000_xxxx_4444_1111 would
be stored normally. The last 7 bytes are used for character width
information for up to three supplementary code sets.
The second output file (a data file) contains the same information, but
is structured for efficient use by the character classification and
conversion routines [see _cccc_tttt_yyyy_pppp_eeee(3C)]. The name of this output file is the
value you assign to the keyword _LLLL_CCCC______CCCC_TTTT_YYYY_PPPP_EEEE read in from _f_i_l_e. Before this
file can be used by the character classification and conversion routines,
it must be installed in the _////_uuuu_ssss_rrrr_////_llll_iiii_bbbb_////_llll_oooo_cccc_aaaa_llll_eeee/_l_o_c_a_l_e directory with the
name _LLLL_CCCC______CCCC_TTTT_YYYY_PPPP_EEEE by someone who is super-user or a member of group _bbbb_iiii_nnnn.
This file must be readable by user, group, and other; no other
permissions should be set. To use the character classification
and conversion tables in this file, set the _LLLL_CCCC______CCCC_TTTT_YYYY_PPPP_EEEE environment variable
appropriately [see _eeee_nnnn_vvvv_iiii_rrrr_oooo_nnnn(5) or _ssss_eeee_tttt_llll_oooo_cccc_aaaa_llll_eeee(3C)].
The third output file (a data file) is created only if numeric formatting
information is specified in the input file. The name of this output file
is the value you assign to the keyword _LLLL_CCCC______NNNN_UUUU_MMMM_EEEE_RRRR_IIII_CCCC read in from _f_i_l_e.
Before this file can be used, it must be installed in the
_////_uuuu_ssss_rrrr_////_llll_iiii_bbbb_////_llll_oooo_cccc_aaaa_llll_eeee_////_l_o_c_a_l_e directory with the name _LLLL_CCCC______NNNN_UUUU_MMMM_EEEE_RRRR_IIII_CCCC by someone who
is super-user or a member of group _bbbb_iiii_nnnn. This file must be readable by
user, group, and other; no other permissions should be set. To use the
numeric formatting information in this file, set the _LLLL_CCCC______NNNN_UUUU_MMMM_EEEE_RRRR_IIII_CCCC
environment variable appropriately [see _eeee_nnnn_vvvv_iiii_rrrr_oooo_nnnn(5) or _ssss_eeee_tttt_llll_oooo_cccc_aaaa_llll_eeee(3C)].
The name of the locale where you install the files _LLLL_CCCC______CCCC_TTTT_YYYY_PPPP_EEEE and
_LLLL_CCCC______NNNN_UUUU_MMMM_EEEE_RRRR_IIII_CCCC should correspond to the conventions defined in _f_i_l_e. For
example, if French conventions were defined, and the name for the French
locale on your system is _ffff_rrrr_eeee_nnnn_cccc_hhhh, then you should install the files in
_gggg_rrrr_oooo_uuuu_pppp_iiii_nnnn_gggg string in which each element is taken as an integer that
indicates the number of digits that comprise the current
group in a formatted non-monetary numeric quantity.
Any lines with the number sign (_####) in the first column are treated as
comments and are ignored. Blank lines are also ignored.
Characters for _iiii_ssss_uuuu_pppp_pppp_eeee_rrrr, _iiii_ssss_llll_oooo_wwww_eeee_rrrr, _iiii_ssss_aaaa_llll_pppp_hhhh_aaaa, _iiii_ssss_dddd_iiii_gggg_iiii_tttt, _iiii_ssss_ssss_pppp_aaaa_cccc_eeee, _iiii_ssss_pppp_uuuu_nnnn_cccc_tttt,
_iiii_ssss_cccc_nnnn_tttt_rrrr_llll, _iiii_ssss_bbbb_llll_aaaa_nnnn_kkkk, _iiii_ssss_pppp_rrrr_iiii_nnnn_tttt, _iiii_ssss_gggg_rrrr_aaaa_pppp_hhhh, _iiii_ssss_xxxx_dddd_iiii_gggg_iiii_tttt, and _uuuu_llll can be represented
as a hexadecimal or octal constant (for example, the letter _aaaa can be
represented as _0000_xxxx_6666_1111 in hexadecimal or _0000_1111_4444_1111 in octal). Hexadecimal and
octal constants may be separated by one or more space and/or tab
characters.
The dash character (_----) may be used to indicate a range of consecutive
numbers. Zero or more space characters may be used for separating the
dash character from the numbers.
The backslash character (_\\\\) is used for line continuation. Only a
carriage return is permitted after the backslash character.
The relationship between upper- and lowercase letters (_uuuu_llll) is expressed
as ordered pairs of octal or hexadecimal constants: <_u_p_p_e_r_c_a_s_e__c_h_a_r_a_c_t_e_r
_l_o_w_e_r_c_a_s_e__c_h_a_r_a_c_t_e_r>. These two constants may be separated by one or
more space characters. Zero or more space characters may be used for
_nnnn_1111 byte width for supplementary code set 1, required
_ssss_1111 screen width for supplementary code set 1
_nnnn_2222 byte width for supplementary code set 2
_ssss_2222 screen width for supplementary code set 2
_nnnn_3333 byte width for supplementary code set 3
_ssss_3333 screen width for supplementary code set 3
_dddd_eeee_cccc_iiii_mmmm_aaaa_llll______pppp_oooo_iiii_nnnn_tttt and _tttt_hhhh_oooo_uuuu_ssss_aaaa_nnnn_dddd_ssss______ssss_eeee_pppp are specified by a single character that
gives the delimiter. _gggg_rrrr_oooo_uuuu_pppp_iiii_nnnn_gggg is specified by a quoted string in which
each member may be in octal or hex representation. For example, _\\\\_3333 or
_\\\\_xxxx_3333 could be used to set the value of a member of the string to 3.
EEEEXXXXAAAAMMMMPPPPLLLLEEEE
The following is an example of an input file used to create the USA-
ENGLISH code set definition table in a file named _uuuu_ssss_aaaa and the non-
monetary numeric formatting information in a file name _nnnn_uuuu_mmmm_----_uuuu_ssss_aaaa.
The error messages produced by _cccc_hhhh_rrrr_tttt_bbbb_llll are intended to be self-
explanatory. They indicate errors in the command line or syntactic
errors encountered within the input file.
NNNNOOOOTTTTEEEESSSS
Changing the files in _////_uuuu_ssss_rrrr_////_llll_iiii_bbbb_////_llll_oooo_cccc_aaaa_llll_eeee_////_CCCC will cause the system to behave
unpredictably.
In IRIX 6.5, the content of the _LLLL_CCCC______CCCC_TTTT_YYYY_PPPP_EEEE locale category was extended to
comply with the XPG/4 standard. The older LC_CTYPE binary format will
not be recognized by the C library. Therefore, all custom-built locales
created under an older version of IRIX must be regenerated with the later
versions of _llll_oooo_cccc_aaaa_llll_eeee_dddd_eeee_ffff(1) and associated _cccc_hhhh_rrrr_tttt_bbbb_llll(1M)/_wwww_cccc_hhhh_rrrr_tttt_bbbb_llll(1M).